Skip to content

Align KTO with DPO: Remove enforcement of causal language models#5701

Merged
albertvillanova merged 1 commit into
mainfrom
align-kto-dpo-rm-encoder-decoder
May 5, 2026
Merged

Align KTO with DPO: Remove enforcement of causal language models#5701
albertvillanova merged 1 commit into
mainfrom
align-kto-dpo-rm-encoder-decoder

Conversation

@albertvillanova

@albertvillanova albertvillanova commented May 5, 2026

Copy link
Copy Markdown
Member

Align KTO with DPO: Remove enforcement of causal language models.

This PR makes a small change to the KTOTrainer by removing the check that raised an error when an encoder-decoder model was used. As a result, the restriction that KTO only supports causal language models is no longer enforced in the code.

Part of:

@qgallouedec, shouldn't I enforce this in DPO instead?


Note

Medium Risk
Removes a guardrail in KTOTrainer that previously blocked encoder-decoder models, which may expose unsupported/untested model architectures to KTO training and lead to runtime errors or incorrect loss computation.

Overview
KTOTrainer no longer raises an error when the provided model is configured as encoder-decoder, aligning its initialization behavior with DPO by removing the causal-LM-only enforcement.

Reviewed by Cursor Bugbot for commit 577037c. Bugbot is set up for automated code reviews on this repo. Configure here.

@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@albertvillanova albertvillanova merged commit babb16b into main May 5, 2026
6 checks passed
@albertvillanova albertvillanova deleted the align-kto-dpo-rm-encoder-decoder branch May 5, 2026 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants